Implementing BLAKE with AVX, AVX2, and XOP

نویسندگان

  • Samuel Neves
  • Jean-Philippe Aumasson
چکیده

In 2013 Intel will release the AVX2 instructions, which introduce 256-bit singleinstruction multiple-data (SIMD) integer arithmetic. This will enable desktop and server processors from this vendor to support 4-way SIMD computation of 64-bit add-rotate-xor algorithms, as well as 8-way 32-bit SIMD computations. AVX2 also includes interesting instructions for cryptographic functions, like any-to-any permute and vectorized table-lookup. In this paper, we explore the potential of AVX2 to speed-up the SHA-3 finalist BLAKE, and present the first working assembly implementations of BLAKE-256 and BLAKE-512 with AVX2. We then investigate the potential of the recent AVX and XOP instructions to accelerate BLAKE, and report new speed records on Sandy Bridge and Bulldozer microarchitectures (7.47 and 11.64 cycles per byte for BLAKE-256, 5.71 and 6.95 for BLAKE-512).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

BLAKE and 256-bit advanced vector extensions

Intel recently documented its AVX2 instruction set extension that introduces support for 256-bit wide single-instruction multiple-data (SIMD) integer arithmetic over double (32-bit) and quad (64-bit) words. This will enable Intel’s future processors—starting with the Haswell architecture, to be released in 2013—to fully support 4-way SIMD com­ putation of 64-bit ARX algorithms (32-bit is alread...

متن کامل

Intel Skylake review

The Skylake Xeon (called Xeon Scalable processor) introduced a number of innovations, notably AVX-512 vectorization instructions capable of 8-wide double precision vectors (previous AVX2 had 4-wide DP vectors). This change in itself has a potential of doubling performance of floating point codes. Other changes include CPU core optimizations, rearchitecture of the caches, and new, mesh-based top...

متن کامل

New ways to boost molecular dynamics simulations

We describe a set of algorithms that allow to simulate dihydrofolate reductase (DHFR, a common benchmark) with the AMBER all-atom force field at 160 nanoseconds/day on a single Intel Core i7 5960X CPU (no graphics processing unit (GPU), 23,786 atoms, particle mesh Ewald (PME), 8.0 Å cutoff, correct atom masses, reproducible trajectory, CPU with 3.6 GHz, no turbo boost, 8 AVX registers). The new...

متن کامل

High-Speed Signatures from Standard Lattices

At CT-RSA 2014 Bai and Galbraith proposed a lattice-based signature scheme optimized for short signatures and with a security reduction to hard standard lattice problems. In this work we first refine the security analysis of the original work and propose a new 128-bit secure parameter set chosen for software efficiency. Moreover, we increase the acceptance probability of the signing algorithm t...

متن کامل

RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies

MOTIVATION Phylogenies are increasingly used in all fields of medical and biological research. Moreover, because of the next-generation sequencing revolution, datasets used for conducting phylogenetic analyses grow at an unprecedented pace. RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analyses of large datasets under maximum likelihood. Since the last R...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IACR Cryptology ePrint Archive

دوره 2012  شماره 

صفحات  -

تاریخ انتشار 2012